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CLONING OF ENTEROKINASE AND METHOD OF USE 

FIELD OF THE INVENTION 

The present invention relates generally to the cloning and expression of enterokinase activity and 
to methods of its making and use. 

BACKGROUND OF THE INVENTION 

The use of fusion proteins as a tool for recombinant protein production is well known in the 
biophannaceutical industry. Fusing the coding sequence for a desired recombinant protein to that of a 
well-expressed gene has several advantages. Most fusion protein strategies position die protein of 
interest at the C -terminal end of the highly expressed fusion partner which allows translation initiation 
to occur on a "proven" gene sequence that is known to be well translated and can help ensure high 
expression levels. Some fusion partners can confer many advantageous attributes to the fusion protein, 
such as specific cellular localization, binding to affinity ligands to aid in purification and detection, and 
even proteolytic and conformational stability. 

While fusion proteins offer numerous advantages, this beneficial physical association of the 
protein domains can also be problematic when it becomes necessary to separate the two (or more) 
components from their covalent tethering. The method of protein cleavage must be both specific and 
efficient and must not yield unwanted side products. This is particularly so when utilizing a fusion 
protein approach for the production of biopharmaceuticals destined for human use. Ideally, the most 
useful method allows for cleavage at a specific target sequence without regard for die internal protein 
sequence and/or without regard for the composition of the fusion partners. The method should produce 
cleaved product with authentic N- and C-termini, should not modify or otherwise adulterate the desired 
protein product, and should be tolerant to a wide range of conditions so that reaction components can 
be tailored to die physical characteristics of die fusion protein without seriously affecting the efficiency 
of the cleavage reaction. In addition, for biophannaceutical production and applications, the cleaving 
reagent should not be from an animal source due to concerns about contamination by infectious agents. 

An ideal choice for such a "universal" fusion protein cleaving method is use of the mammalian 
enzyme enterokinase (enteropeptidase). Enterokinase is the physiological activator of trypsinogen and 
cleaves with high specificity after the sequence (Asp 4 )-Lys. Light et al , J. Protein Chem. 70:475- 
480(1991). It is possible to engineer the fusion protein to include a linker DNA sequence encoding the 
amino acid sequence recognized by enterokinase. See for example, Boilen etal. , USPN 4,828,988 (May 
9, 1988); Rutter, USPN 4,769,326 (September 6, 1988); and Mayne et al , USPN 4,745,069 (May 17, 
1988). However, although extensive research efforts have been mounted by several different research 
groups since the first partial purification of bovine enterokinase more than 15 years ago, no one has yet 
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been successful in cloning enterokinase. Porcine enterokinase was first isolated in the early 1970s 
(Maroux etai , J.Biol.Chem. 246:503 l(1971))and bovine (Anderson etai. , Biochemistry 76:3354(1977)) 
and human (Grant et aL. Biochem. J. 755:243(1976)) enterokinases were isolated in the late 1970s. 
Liepnieks et aL, J. Biol. Chem. 254:1677(1979) described an enterokinase having 35% carbohydrate, 
a molecular weight of 150,000, with a heavy (115,000) and light (35,000) chain connected by one or 
more disulfide bonds. Subsequent studies of the light chain, Le., the catalytic subunit, were reported 
in Light et al., J. Biol. Chem. 259:13195(1984). Most recently, Light et aL, J. Protein Chem. 
70:475(1991), disclosed what was later proven to be an incorrect partial amino- terminal sequence for 
the catalytic subunit of bovine enterokinase. To date, it has been impossible to obtain recombinantly 
produced enterokinase activity and there continues to exist a need for such a product. 

BRIEF SUMMARY 

The present invention provides novel purified nucleic acid sequences encoding enterokinase 
activity. Specifically provided is mammalian enterokinase activity, including human and bovine 
enterokinase and comprising the nucleic acid sequence as set forth in SEQ ID NO:l, encoding the 
catalytic light chain, as well as portions of the heavy chain. The sequence comprises 2581 nucleotides 
and includes the catalytic domain, Le. , nucleotides 1691 to 2398. A nucleotide sequence encoding this 
enterokinase activity and contained in the plasmid designated pEK-2/GI734 was deposited with the 
American Type Culture Collection (ATCC) on February 2, 1993 and accorded the accession number 
69232. In a further embodiment, die invention comprises the expression products of the novel sequences 
having enterokinase activity. 

Nucleic acid forms such as genomic DNA (gDNA), complementary DNA (cDNA), and DNA 
prepared by de novo chemical synthesis from nucleotides, as well as DNA with deletions or mutations, 
allelic variants and sequences that hybridize thereto under stringent conditions (or which would hybridize 
but for the redundancy of the genetic code) are also within the contemplation of the invention so long 
as they encode polypeptides having enterokinase activity as defined below. Also, forms which contain 
modifications of the catalytic site of enterokinase which may allow for alteration of the specific cleavage 
site recognized by the enzyme are included. Further provided are novel messenger RNA (mRNA) 
sequences corresponding to these DNA sequences. 

Association of nucleic acid sequences provided by the invention with homologous or 
heterologous species expression control sequences, such as promoters, operators, regulators, and the like, 
allows for in vivo and in vitro transcription to the corresponding mRNA which, in turn, allows 
translation of proteins and related poly- and oligo- peptides, in large quantities, having enterokinase 
activity. In a presently preferred expression system of the invention, enterokinase encoding sequences 
are operatively associated with a regulatory promoter sequence allowing for transcription and translation 
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in a eukaryotic cell system to provide e.g., enterokinase polypeptides having protease activity. The 
novel nucleic acid sequences may optionally encode both the heavy chain and the light chain of 
enterokinase, or the light chain alone which surprisingly still provides enterokinase activity. The 
enterokinase activity of the invention may be generated from one or more expression vector(s) each 
comprising one or more portions of the enterokinase activity, or, alternatively, the enterokinase activity 
can be generated from one or more expression vector(s) contained in one or more cell lines, each of 
which express all or a portion of the enterokinase activity. Thus, the heavy and light chains may be 
separately expressed in separate cell lines if desired. In addition, the enterokinase activity can be 
produced as a fusion protein, e.g. , using thioredoxin as the fusion partner. Optionally, the fusion partner 
can be all or part of yet another proteolytic enzyme, such as PACE, trypsinogen, and the like. Indeed, 
such an enterokinase fusion protein can contain an enterokinase cleavage site between the component 
protein domains, thereby allowing autocatalytic processing to separate the two domains and to yield 
mature, active enterokinase. 

Incorporation of these sequences into prokaryotic and eukaryotic host cells by standard 
transformation and transfection processes, is also within the contemplation of the invention and is 
expected to provide useful enterokiiiase in quantities greatly in excess of those obtainable from tissue 
sources. The use of appropriate host cells provides for such post-translational modifications, e.g., 
truncation, glycosylation, etc. , when needed to confer optimal biological activity on the expression 
products of the invention. Such appropriate host cells can include for example E. coli, CHO, yeast, and 
lepidoptera cells. 

Novel protein products of the invention include those having the primary structural conformation 
(/. e. , amino acid sequence) of enterokinase comprising the sequence substantially as set forth in SEQ ID 
NO:2 and having enterokinase protease activity. A presently preferred embodiment comprises the amino 
acid sequence substantially as set forth in SEQ ID NO:2 and specifically comprising amino acids 564 
to 798. Antibodies to such products are also provided. 

Also provided by the invention are methods for cleaving fusion proteins utilizing the novel 
protein products of the invention. These protein products can include both heavy and light chains or can 
be solely light chain enterokinase activity. Light chain alone is a "soluble" form of the enterokinase 
activity and is devoid of the non-enzymatic heavy chain which is believed to act as a membrane anchor 
in vivo. Surprisingly, while this form (light chain alone) of enterokinase is a poorer enzyme on 
trypsinogen, it is much more effective on fusion proteins. Provided also is a production method wherein 
one of the fusion protein members is itself enterokinase activity, which, upon cleavage of the fusion 
protein domains at a strategically located enterokinase recognition site, yields additional enterokinase 
activity at each round of cleavage to cleave more fusion protein. 
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sequences or may employ sequences having deletions from and/or mutations in the sequences but which 
still encode an enterokinase activity as described above. 

In one embodiment of the invention, the enterokinase activity is the protein encoded by the 
nucleotide sequence set forth in SEQ ID NO: 1 and includes the mature catalytic domain, L e. , nucleotides 
1691 to 2398. As used herein, the term "a sequence substantially as" set forth in a SEQ ID NO is meant 
to encompass those sequences which hybridize to the sequence under stringent conditions as well as those 
which would hybridize but for the redundancy of the genetic code. Stringent conditions are generally 
0.2 X SSC plus 0.1% SDS at 65°C. The terms " substantially duplicative" and "substantially 
corresponding" are meant to include those sequences which, though they may not be identical to those 
set forth in a SEQ ID NO, still result in expression product, proteins, and/or synthetic polypeptides that 
have enterokinase activity. Thus, using the nucleotide sequence as set forth in SEQ ED NO:l, DNA 
encoding enterokinase activity can be isolated and cloned from other sources as well using appropriate 
vectors, selectable markers and recombinant DNA techniques. The corresponding cDNA can be 
prepared from appropriate mRNA sources. Genomic DNA encoding enterokinase activity may also be 
obtained from a genomic library using a cDNA probe or oligonucleotide probes. Alternatively, an 
enterokinase activity-encoding DNA sequence may be prepared synthetically. The use of intron-less, 
e.g. , cDNA sequences, are preferred, as bacterial expression requires intron-less sequences. The 
sequence may also be modified appropriately for expression in bacteria as described, supra. 

The present invention also provides a method for producing enterokinase activity preferably in 
non-glycosylated form. The method involves culturing a host cell, preferably bacterial, transformed with 
(L e. , containing and capable of expressing) a DNA sequence encoding the enterokinase activity which 
is under the expression control of suitable transcriptional control sequences. The DNA sequence may 
encode both the heavy and light chains, or only light chain, or only as much as is required to result in 
the expression of enterokinase activity and may be deliberately designed to include preferred codons for 
expression in bacterial cells as is well known in the art. In the latter case, the resulting expression 
product of such deliberately designed DNA sequences may contain full length and may also contain a 
truncated, biologically active, mature peptide sequence encoding enterokinase activity, e.g. , light chain 
alone. 

In another preferred method for expression of enterokinase activity, the DNA sequence encoding 
the catalytic domain of enterokinase is fused to a signal peptide (pre-region) and pro-region of a gene, 
such as the human PACE gene. PACE is a serine protease which cleaves after dibasic residues and is 
responsible for propeptide processing of a number of secreted proteins. When the PACE signal peptide 
(pre-region) and pro-region coding sequence is fused in-frame to the mature enterokinase light chain 
coding sequence and expressed in mammalian cells, e.g., CHO cells, COS cells, BHK cells, and the 
like, the sequence is translated to produce a chimeric protein which is secreted and which is then 
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processed to remove the signal peptide thereby yielding pro-enterokinase; subsequent cleavage, by either 
endogenous or exogenous PACE, removes the pro-peptide from the N-terminus of the enterokinase and 
mature enterokinase activity is secreted into die conditioned medium. Optionally, as a source of PACE, 
this method may employ co-expression of a modified, soluble form of the PACE gene having the 
5 transmembrane domain of PACE deleted. See, for example, Hatsuzawa et aL, J. Biol. Chem. 
257:16094(1992) for a description of soluble PACE and delineation of the pro-peptide portion of the 
protein. Other pre/pro regions can also be used to similar advantage in expressing enterokinase activity, 
for example, the pre/pro region of yeast Kex2 as described in Brenner et. al. 9 Proc. NatL Acad. ScL 
U.S.A. £9:922(1992), or the pre/pro region of trypsinogen as described in LeHeuron et al. t Eur. J. 
10 Biochem. 795:767(1990). 

As used herein, the term "pro-protein" means a protein having attached to it a "pro" region; a 
"pre-pro-protein" has a "pre-pro" region attached to it. The "pre" region, or signal peptide, refers to 
the most N-terminal stretch of amino acids which target the remaining portion of the polypeptide to be 
translocated across a membrane, e.g. t the endoplasmic reticular membrane, and is usually subsequently 
15 cleaved by an endogenous signal peptidase. 

The "pro" region is an intervening region between the signal peptide (pre-region) and the mature 
protein. This sequence may be responsible for enhancing some post-translational modifications; it may 
be necessary for proper folding, or it may act to inhibit the activity of the mature protein until it is 
removed post-translationally. The pro region is usually removed after signal peptide cleavage by an 
endoprotease. A "pre/pro" region is a combination of the "pre" region and the "pro" region as described 
above. More specifically, useful DNA constructs include fusions of DNA encoding enterokinase activity 
with the pre/pro region of trypsinogen. The signal peptide and the entire 8 amino acid pro region of 
bovine anionic trypsinogen (which includes an enterokinase recognition site) is fused to the amino 
terminus of the mature enterokinase catalytic domain. Yet another DNA construct involves fusion of 
the mature enterokinase catalytic domain to the C -terminus of E. coli thioredoxin, having an intervening 
spacer sequence encoding a known cleavage site such as an enterokinase cleavage site. 

The DNA sequence encoding enterokinase activity may be inserted by conventional methods into 
an expression vector suitable for the desired host cell as is well known in the art. For bacterial or yeast 
production, the DNA sequence should not contain introns. For higher eukaryotic expression, it is not 
necessary to avoid introns, but cDNA sequences are preferred. Preferably for eukaryotic expression, 
the DNA sequence should contain a secretory leader sequence. The vectors should contain typical vector 
elements well known in the art including replication sites, selectable markers and transcriptional control 
sequences compatible with the chosen host. 

Various strains of E. coli useful as host cells for the production of non-giycosy iated , 
homogeneous enterokinase activity are also well-known in the art. A non-exclusive list of such strains 
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includes MC1061, DH1, RR1, C600hfl, K803, JA221, HB101, JM101 and various K12 strains, 
including the strain used in the Examples. Alternatively, other bacterial species may be used, including 
B. subtilis, various strains of Pseudomonas , other bacilli and the like. 

Enterokinase activity may also be produced by heterologous expression of an enterokinase 
5 activity encoding sequence in mammalian cells. Enterokinase activity is thus obtainable in glycosylated 
form, that is, unless glycosylation is prevented. Where desired, glycosylation can be inhibited by 
tunicamycin or by site-directed mutagenesis of gylcosylation sites, as is well known in the art. Suitable 
m a mm alian expression vectors and host cells for production of enterokinase activity are also well known 
in the art and include, without limitation, the vectors pXM and pMT2 and Chinese hamster ovary (CHO) 
10 cells, monkey COS-1 cells, CV-1, HeLa, mouse L-929, 3T3 cells and BHK cells. The construction and 
use of some exemplary mammalian vectors and cell lines is well known to those skilled in the art and 
is discussed in detail in WO 88/00598. 

Many strains of yeast cells, known to those skilled in the an, are also available as host cells for 
expression of the enterokinase activity of the present invention. Yeast cells are especially useful as a 
15 host for the PACE pre/pro fusion to mature enterokinase as described above. When expressed using a 
suitable yeast vector, the fusion is secreted by virtue of the PACE signal peptide, and the PACE pro 
region is subsequently processed by the endogenous yeast protease KEK2, an enzyme homologous to 
human PACE which also cleaves after paired basic residues. Additionally, where desired, insect cells 
may be used as host cells. See, for example, Miller et ai , Genetic Engineering 5:277-98(Plenum Press 
20 1986) and references cited therein. 

When the enterokinase activity of this invention is expressed in bacterial cells, it may be 
expressed intracellular^ usually without regard to refolding since that is typically unnecessary to obtain 
the protein in active form, or it may be secreted from bacterial cells in active form, if a secretory leader 
is included. Where necessary or desired, as when reduced bioactivity is observed, the enterokinase 
25 activity product may be refolded by conventional methods such as incubation of protein in urea or 
guanidine HC1 with dithiothreitol or 0-mercapto ethanol, followed by dilution to reduce the concentration 
of these reagents and treatment with oxidizing agents. 

For example, £. coli cells, genetically engineered to express an enterokinase activity DNA 
sequence as described herein, are cultured under suitable conditions permitting the production and 
30 intracellular accumulation of enterokinase activity protein. The cells are then harvested, L e. , separated 
from the medium in which they were cultured and from any other materials, and iysed and the desired 
biologically active enterokinase activity protein is purified from the lysate. Optionally, only minimal 
purification of the enterokinase activity is required. 

The term "biologically active" means a preparation of enterokinase activity that exhibits a 
35 detectable level of proteolytic cleavage activity as assayed by conventional methods discussed, supra. 
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Various purification techniques, such as column chromatography (e.g., ion exchange, immunoaffinity , 
etc.), affinity purification on soybean trysin inhibitor (STI), pancreatic trypsin inhibitor (PTI) or PABA, 
gel filtration and reverse phase HPLC, are useful in purifying the desired protein. See, for example, 
Gospodarowicz et a/., J. Cell. Phys 722:323-32(1985), Iwane et al., Biochem. and Biophys. Res. 
Comm. 746:470-77(1987), Fox*/ a/., J. Biol. Chem. 255:18452-58(1988), EP 0 259 953 published June 
4, 1987, and EP 0 237 966 published September 23, 1987. 

The enterokinase activity of the invention can be used in a method for cleaving protein having 
an enterokinase cleavage site, and especially fusion proteins having such a cleavage site engineered into 
their sequence. The amounts needed are readily determined empirically by one skilled in the art. 
Indeed, as described herein, recombinant bovine enterokinase catalytic domain is a superior reagent for 
cleavage of fusion proteins when compared to the bovine-derived two-chain form, as it is much more 
efficient and is not contaminated with trace amounts of other proteolytic proteins which are difficult to 
remove. As another aspect of the invention, the enterokinase activity of the invention is incorporated 
as one of the fusion protein partners to yet another protein. As such, with the addition of a minimal 
amount of exogenous enterokinase activity to the reaction vessel (or by merely concentrating the fusion 
protein adequately), a minimal amount of cleavage of the fusion protein results in the release of 
additional enterokinase activity which in turn can catalyze many more proteolytic cleavages of fusion 
proteins. In this way, large amounts of enterokinase activity can be produced from a fusion protein in 
an autocatalytic manner. Also provided by the invention is a method for producing proteins from fusion 
proteins which comprises the steps of: 

(a) growing, in culture, a host cell transformed or transfected with 

(i) a nucleic acid which encodes enterokinase activity and which upon expression 
is segregated into the periplasmic space; and 

(ii) one or more nucleic acids which encode a fusion protein and an enterokinase 
cleavage site and which, upon expression are segregated to the cytoplasmic space, 

(b) allowing said periplasmic space and said cytoplasmic space to co-mingle thereby, 

(c) allowing said enterokinase activity to cleave said fusion protein, and 

(d) resulting in protein production. 

Pharmaceutical compositions containing the homogeneous enterokinase activity of the present 
invention may be useful as digestive agents. Such pharmaceutical compositions may also contain 
pharmaceutical^ acceptable carriers, diluents, fillers, salts, buffers, stabilizers and/or other materials 
well-known in the art. The term "pharmaceutically acceptable" means a material that does not interfere 
with the effectiveness of the biological activity of the active ingredients) and that is not toxic to the host 
to which it is administered. The characteristics of the carrier or other material will depend on the route 
of administration. Administration can be carried out in a variety of conventional ways. Oral 



WO 94/16083 



PCT/US94/00616 



9 

administration is preferred. In such case, the enterokinase activity of the present invention can be 
enterically coated, the preparation of which is within the skill in the art. In practicing the method of 
treatment of this invention, a therapeutically effective amount of enterokinase activity is administered. 
The term "therapeutically effective amount" means the total amount of each active component of the 
method or composition that is sufficient to show a meaningful benefit, /.<?., restoration of digestive 
function. When applied to an individual active ingredient, administered alone, the term refers to that 
ingredient alone. When applied to a combination, the term refers to combined amounts of the active 
ingredients that result in the therapeutic effect, whether administered in combination, serially or 
simultaneously. The number of applications may vary, depending on the individual and the severity of 
the digestive disorder. In yet another method of use, it is contemplated that the DNA encoding 
enterokinase would be useful in gene therapy as a means of correcting digestive disorders due to 
enterokinase deficiency. * 

The invention is further described in the following examples, which are intended to illustrate the 
invention without limiting its scope. Example 1 describes the cloning of a 26 bp bovine enterokinase 
gene fragment. Additional protein sequencing of bovine enterokinase is described in Example 2. The 
amplification and cloning of a gene fragment adjacent to the Example 1 fragment is the subject of 
Example 3. Example 4 relates to the cloning of the enterokinase catalytic chain. A comparison of the 
different cDNA clones, as well as a partial coding sequence for the non-catalytic (heavy) chain is set 
forth in Example 5. Example 6 describes the isolation of additional enterokinase coding sequence 
including additional heavy chain sequence. Example 7 describes the use of the bovine enterokinase 
sequence to clone other mammalian enterokinase genes. Example 8 describes the expression of a gene 
encoding the catalytic domain of bovine enterokinase in both a procaryotic cell system as well as in a 
eukaryotic cell system. Example 9 relates to the co-expression of fusion proteins and to the production 
of active enterokinase. Example 10 relates to the use of enterokinase as a therapeutic agent in the 
treatment of certain digestive disorders. 
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EXAMPLE 1 

CLONING OF A BOVINE 
ENTEROKINASE GENE FRAGMENT 

A purported N-terminal 27 amino acid sequence of die catalytic (light) chain of bovine 

enterokinase was provided by Albert Light of Purdue University and was later published in Light ex al. , 

supra. As discussed in greater detail, infra. , this sequence was incorrect. Because of the error, the 

tyrosine reported at position 8 was used in designing probes and primers due to its low degeneracy (only 

two possible codons encode tyrosine). However, the actual residue at position 8 is in fact arginine, with 

six possible codons. This (erroneous) sequence is as follows: 

SEQ ID NO:3 

1 10 20 27 

IVGGSDSYEGAWPWVVAL YFDDQ-QVCG 



The "provided" sequence was backtranslated into all possible DNA codons which it could 
encode, and was used to design pools of oligonucleotide primers 17 base pairs in length with 5' 
extensions to encode restriction endonuclease cleavage sites to be used as primers in PCR reactions [Saiki 
et al., Science 230: 1350-1354(1985); Mullis et al. t Cold Spring Harbor Symposia on Quantitative 

20 Biology, Vol. £7:263-273(1986)]. The design of these oligonucleotide pools was critical to the potential 
success of the endeavor. Comparison of this N-terminal protein sequence to previously identified 
sequences in the databases revealed significant homology to a large number of mammalian pancreatic 
and serum serine proteases. To prevent unwanted amplification of DNA sequences encoding these 
"unwanted" proteins, the PCR primer pools were designed to intentionally avoid these highly 

25 homologous regions. However, the competing requirement of spacing the sequences to which the primer 
pools anneal as far apart as possible, was taken into account to maximize the amount of exact 
enterokinase sequence generated for the amplification to provide useful information. 

Two degenerate oligonucleotide pools, which together contained all possible codons for the N- 
terminal amino acid sequence: IVGGSD (amino acids 1-6) SEQ ID NO:8, were synthesized. These two 

30 pools differed only in the codons used for the serine residue in the protein sequence and were used 
independently as a means of decreasing the degeneracy of each pool: 



SEQ ID NO:4 

PRIMER 1A 5' CTCGAATTCATHGTNGGNGGNTCNGA 3' 768x 
35 and 
SEQ ID NO:5 

PRIMER IB 5' CTCGAATTCATHGTNGGNGGNAGYGA 3' 384x 
As used herein, the symbol "H" refers to equal proportions of nucleotides C, T, and A. The symbol 



OHOOOCtD: <WO__j»416063M JL» 



WO 94/16083 



PCT/US94/00616 



11 

"Y" refers to equal proportions of nucleotides C and T. refers to equal proportions of either A 
or G at that position. The symbol "N" refers to equal proportions of the four nucleotides G, A, T, and 
C. Each of these pools had a 5' extension which contained an EcoRI site which is shown in bold print. 

Another pool of oligonucleotides was synthesized which contained the reverse complement of 
all possible codons for the most C-terminal known sequence: 

SEQ ID NO:6 

DQQVCG (amino acids 22-27 of SEQ ED NO:3). 
This pool contained a common 5' extension shown in bold which contained a HindHI site: 

SEQ ID NO:7 

PRIMER 2 5' TCCAAGCTTCCRCANACYTGYTGRTC 3' 64x 

The DNA products from the first series of amplification reactions were used as the template for 
a second series of amplifications primed by oligonucleotide pools which are complementary to the 
inferred DNA coding sequence of amino acids "interior" to the first set in the linear sequence. Thus, 
a pool of 17 base pair oligonucleotides complementary to all possible codons for the sequence: 

SEQ ID NO: 8 

YEGAWP (which corresponds to amino acids 8-13 of SEQ ID NO: 3, including the incorrect 
assignment of Y at position 8) was synthesized: 

SEQ ID NO:9 

PRIMER 3 5* TAYGARGGNGCNTGGCC 3* 64x 

This pool of primers was then combined in the second PCR reaction with another pool comprising the 
reverse complement of all possible codons for the sequence: 

SEQ ID NO: 10 

FDDQQV (corresponds to amino acids 20-25 of SEQ ID NO:3) 
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SEQ ID NO: 11 

PRIMER 4 5* TCCAAGCTTACYTGYTGRTCRTCRAA 3* 32x 

This pool has partial overlap with the 3' pool used in the first series of amplifications, and contains a 
5' extension (in bold) which includes a HindlH site. 
5 Bovine genomic DNA (0.9 mg/ml in dHjO) was boiled for 5 minutes to denature and was then 

immediately placed on ice. Reaction conditions for each 50 /xl amplification reaction were: 2 heat 
denatured bovine genomic DNA, 10 mM Tris-Hcl pH 8.3, 50 mM Kcl, 1.5 mM MgCl^ 0.01% gelatin, 
1.0 fxM of each oligonucleotide pool, 200 of each dNTP, and 1 unit of Amplitaq DNA polymerase 
(Perkin-Elmer Cetus). Forty amplification cycles were run under the following conditions: cycle 1 = 
10 94°C, 3 minutes/40°C, 1 minute/72°C, 1 minute. Cycles 2^*0 = 94°C, 1 minute/40°C, 1 
minute/72°C, 1 minute. The first round of 40 cycles utilized either primer pools 1A and 2 or pools IB 
and 2. After 40 cycles of amplification, 0.5 jtl of this reaction was used as template for a second 35 
cycles of PCR using primer pools 3 and 4. The conditions for this round of 35 PCR cycles were 94°C, 
1 minute/35°C, 2 minutes/72 °C, 2 minutes. Reaction components were the same as the first round 
except for the DNA template. The DNA template in the second round was the product of the previous 
round. 

PCR products obtained as described above, were run on 5% acrylamide preparative gels, and 
bands were stained with 0.5/xg/ml ethidium bromide, excised from the gel and electroeluted. DNA 
manipulations and ligations were performed using standard techniques [Sambrook et aL , in "Molecular 
Cloning, a Laboratory Manual," second edition, Cold Spring Harbor Laboratory Press. (1989)]. PCR 
products were first treated with Klenow fragment of DNA polymerase I in the presence of all four 
deoxynucleotide triphosphates, then digested with Hindm (New England Biolabs) and subcloned into a 
pUC 19 f Norrander et at. , Gene 26: 101-106X1983)] HincII-Hindni vector. Transformants were identified 
which contained plasmids with an apparent insert of approximately 72 bp. These plasmids were isolated 
and their inserts were sequenced using the Sequenase kit (United States Biochemical) and a sequencing 
primer which anneals to pUC19. The DNA sequence of the inserts was then translated to reveal an open 
reading frame which corresponded exactly to the amino acid sequence predicted by the known protein 
sequence (WWALY, amino acids 14-19 of SEQ ID NO: 3). Due to the possibility of mismatch 
tolerance during primer annealing, only the sequence between the two PCR primers could be assumed 
to be correct. However, it was assumed that the proper serine codon was in primer pool IB (AGY), 
as the other pool (1 A) railed to yield a specific product; this was determined by Southern blot of product 
from the first 40 cycles, probed with pool designated SEQ ED NO:9. Also, since the "wobble" position 
of the codon for Pro 13 was determined to be a thymidine, and there is only one possible codon for the 
adjacent Trp 12 , 5 additional bases were also assumed to be fairly certain. When the first two invariant 
bases of the codon for Phe 20 are included, 26 contiguous base pairs of coding sequence for nine amino 
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acids of the enterokinase catalytic chain (amino acids Trp 12 to Phe 20 ) had been determined with a fair 
degree of certainty. This sequence is nucleotides 1724 to 1749 of SEQ ID NO:l. 

EXAMPLE 2 

5 PROTEIN SEQUENCING OF BOVINE ENTEROKINASE 

The exact DNA sequence (26 bp) of amino acids 12-20 of mature bovine enterokinase light chain 
was not sufficient to allow cDNA isolation by a hybridization approach. Accordingly, additional 
adjacent protein sequence was sought. 

Bovine enterokinase (EK-2 grade) was purchased from Biozyme. The enzyme was greater than 
10 99% impure, thus the enzyme was further purified using porcine pancreatic trypsin inhibitor (Sigma) 
coupled to activated Sepharose CL-4B (Sigma) [Liepnieks etai , J. Biol. Chem. 254: 1677- 1683(1979)]. 
The resulting enzyme was reduced and alkylated to separate die heavy chain from the light chain and 
run on a preparative acrylamide gel. The proteins were electroblotted from the gel onto a Problot 
membrane (Applied Biosystems, Inc.), and die catalytic chain of M, 42,000 daltons was excised from 
IS the membrane after staining and was sequenced using an Applied Biosystems Model 470A pulse liquid 
sequencer. The sequence for the first 30 amino acids was determined and is: 

SEQ ID NO: 12 

1 10 20 30 

20 IVGGSDSREGAWPWVVALYFDDQQVCGASL 

Of particular note is the observation that the amino acid residue in the 8* position was determined to be 
an arginine, in contrast to the tyrosine incorrectly reported by Light et al. , supra. This is a crucial area 
of the sequence for designing PCR primers due to its reported low degeneracy. 

25 Two additional bands were observed upon electoblotting. The expected heavy chain band at M, 

150,000 daltons and another band at M, 90,000 daltons were excised from the Problot membrane, treated 
individually with trypsin, and die resulting fragments separated on reverse phase. Well-separated peaks 
were collected and sequenced. 

The reduced and alkylated bovine enzyme was also run on a C4 reverse phase column (Vydac) 

30 to separate the non-catalytic (heavy) chain from the catalytic (light) chain. The peak corresponding to 
the catalytic chain was treated with TPCK-trypsin (Worthington). The resulting tryptic peptides were 
separated on C18 reverse phase HPLC (Vydac). Individual peaks were subjected to sequence analysis 
on the protein sequencer. The results are presented in Example 3. 
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EXAMPLE 3 

AMPLIFICATION AND CLONING 
OF AN ADJACENT GENE FRAGMENT 

Tryptic digestion and chromatographic separation and isolation of individual peptide fragments 

of enterokinase catalytic chain, followed by subsequent sequencing of each resulting peptide, resulted 

in the following sequences: 

SEQ ID NO: 13 

EGAWPWVVALYFDDQQVCGASLVS 

SEQ ID NO: 14 

DWLVSAAHCVYGR 

SEQ ID NO: 15 

FTEWIQSFLH 

SEQ ID NO: 16 

ICSIAGWG ALIY QGST AD VLQEA 

SEQ ID NO: 17 

WLLAGVTSFGYQCALPN(N?)PGVYA 

SEQ ID NO: 18 

NMEPSK 

SEQ ID NO: 13 is a 24 residue peptide that partially overlapped with the N-terminal sequence as 
determined in Example 2. These peptide sequences were used to search protein sequence databases for 
homology. The protein which displayed the highest degree of sequence homology to the N-terminal 
peptide sequence of enterokinase was an inferred amino acid sequence from a human liver cDNA clone, 
named hepsin [Leytus etal., Biochemistry 27:1067-1074(1988)]. Using the hepsin sequence as a guide, 
another enterokinase catalytic chain tryptic peptide (SEQ ID NO: 14) appeared it might be contiguous 
with the N-terminal/overlapping tryptic sequence already identified. This peptide contained a sequence 
highly homologous to the histidine region of the "catalytic triad'* which is characteristic of serine 
proteases. Oligonucleotide pools which were complementary to the reverse complement of the 
backtranslated amino acid sequence for a region of this peptide, AHCVY, were synthesized. These 
oligos also contained a 5' extension (shown in bold) which encodes a BamHI site: 
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EXAMPLE 4 

CLONING OF ENTEROKINASE CATALYTIC CHAIN 

Two separate bovine small intestine cDNA libraries were used for the cloning of the gene for 
the enterokinase catalytic domain. PCR was performed on X libraries of bovine liver and small intestine 
cDNA's using exact primers designed to this newly determined nucleotide sequence as described supra. 
in Example 3. cDNA from bovine liver gave a very weak product implying that the abundance in the 
library was very low, while the small intestine library yielded much more specific product. Thus bovine 
small intestine was chosen as a possible mRNA source. The first cDNA library was a X gtlO library 
which was purchased from Clontech. The second cDNA library, referred to as the Lambda Zap library, 
was prepared as follows. Bovine duodenal tissue was obtained and mRNA was prepared from a portion 
of the tissue using the guanidinium extraction method [Chirgwin et al., Biochemistry 78:5294(1979)]. 
Oligo (dT)-primed cDNA was synthesized using standard techniques [Sambrook et aL, supra.]. 
Synthetic Notl/EcoRI adapters (Invitrogen) were ligated to the resulting cDNA, which was then li gated 
into Lambda Zap EE Eco RI arms (Invitrogen). 

Recombinant phage from either cDNA library were hybridized, in duplicate, to two separate 
oligonucleotides whose sequences were complementary to the enterokinase DNA sequence determined 
from the subcloned PCR fragments of Example 3. The first oligonucleotide was 21 bases in length and 
comprised the plus strand of the coding sequence for Asp 6 to Trp 12 . The second oligonucleotide was 20 
bases in length and comprised the minus strand of the coding sequence for residues Asp 35 to Ala 41 . 

The oligonucleotides were labelled using 7ATP and polynucleotide kinase [Sambrook et 
al. , supra. ] . Hybridizations were performed as described [Sambrook et aL , supra. ] using the following 
conditions: 6x SSC, 0.5% SDS, 5x Denhardt's solution, lOmM NajEDTA, 100 Mg/ml yeast RNA, and 
0.1 pmole/ml labelled oligonucleotide. After hybridization for 16 hours at 60°C, filters were washed 
in 2xSSC, 0.1% SDS at room temperature 4 times for 15 minutes each time. 

A single plaque containing sequences which hybridized to both of the oligonucleotide probes was 
isolated from lxl(f recombinant phage from the Clontech library. The sequence of the insert (called 
clone #3e) in this recombinant phage was 769bp. The insert contained a long open reading frame which 
encoded several of the tryptic peptides previously sequenced: SEQ ID NO: 13,14 17, as well as a portion 
of SEQ ID NO: 16. The reading frame continued past the 3* end of the insert, suggesting that the clone 
was incomplete. In addition, the reading frame contained the IVGG- N-terminus predicted from the 
protein sequencing data. The reading frame remained open in the 5* direction for another 26 codons 
before terminating. 

From the Lambda Zap II bovine small intestine cDNA library, 5 X 10 3 phage were screened by 
hybridization with the same two oligonucleotides as above. Only two recombinant phage were isolated 
which contained enterokinase -specific sequence complementary to the oligonucleotide probes. One of 
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EXAMPLE 4 

CLONING OF ENTEROKINASE CATALYTIC CHAIN 

Two separate bovine small intestine cDNA libraries were used for the cloning of the gene for 
the enterokinase catalytic domain. PCR was performed on X libraries of bovine liver and small intestine 
cDNA's using exact primers designed to this newly determined nucleotide sequence as described supra. 
in Example 3. cDNA from bovine liver gave a very weak product implying that the abundance in the 
library was very low, while the small intestine library yielded much more specific product. Thus bovine 
small intestine was chosen as a possible mRNA source. The first cDNA library was a X gtlO library 
which was purchased from Clontech. The second cDNA library, referred to as the Lambda Zap library, 
was prepared as follows. Bovine duodenal tissue was obtained and mRNA was prepared from a portion 
of the tissue using the guanidinium extraction method [Chirgwin et al., Biochemistry 75:5294(1979)]. 
Oligo (dT)-primed cDNA was synthesized using standard techniques [Sambrook et al, supra.]. 
Synthetic Notl/EcoRI adapters (Invitrogen) were ligated to the resulting cDNA, which was then ligated 
into Lambda Zap II Eco RI arms (Invitrogen). 

Recombinant phage from either cDNA library were hybridized, in duplicate, to two separate 
oligonucleotides whose sequences were complementary to the enterokinase DNA sequence determined 
from the subcloned PCR fragments of Example 3. The first oligonucleotide was 21 bases in length and 
comprised the plus strand of the coding sequence for Asp 6 to Trp 12 . The second oligonucleotide was 20 
bases in length and comprised the minus strand of die coding sequence for residues Asp 35 to Ala 41 . 

The oligonucleotides were labelled using pP]- ?ATP and polynucleotide kinase [Sambrook et 
al. , supra. ] . Hybridizations were performed as described [Sambrook et at. , supra. ] using the following 
conditions: 6x SSC, 0.5% SDS, 5x Denhardt's solution, lOmM Na^DTA, 100 /ig/mi yeast RNA, and 
0. 1 pmole/ml labelled oligonucleotide. After hybridization for 16 hours at 60° C, filters were washed 
in 2xSSC, 0.1% SDS at room temperature 4 times for 15 minutes each time. 

A single plaque containing sequences which hybridized to both of the oligonucleotide probes was 
isolated from lxlO 6 recombinant phage from the Clontech library. The sequence of the insert (called 
clone #3e) in this recombinant phage was 769bp. The insert contained a long open reading frame which 
encoded several of the tryptic peptides previously sequenced: SEQ ID NO: 13, 14 17, as well as a portion 
of SEQ ID NO: 16. The reading frame continued past the 3' end of the insert, suggesting that the clone 
was incomplete. In addition, the reading frame contained the IVGG- N-terminus predicted from the 
protein sequencing data. The reading frame remained open in the 5' direction for another 26 codons 
before terminating. 

From the Lambda Zap II bovine small intestine cDNA library, 5 X 10 5 phage were screened by 
hybridization with the same two oligonucleotides as above. Only two recombinant phage were isolated 
which contained enterokinase-specific sequence complementary to the oligonucleotide probes. One of 
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these (called clone #11) was 1494 base pairs long and contained all of the light chain coding sequence 
present in the cDNA clone from the Clontech library but differed 5' of the coding sequence for light 
chain. It also contained the remaining 3' coding sequence and almost 80 bases of 3' untranslated 
sequence. This clone also contained a significant extension of the open reading frame preceding the N- 
tenninal IVGG of the mature catalytic chain, extending 266 codons and remaining open at the 5' limit 
of this clone (to nucleotide 893 of SEQ ID NO:l), which differed with clone 3e. 

The second phage had an insert (called clone #22) which was considerably, smaller, only 531 bp 
(SEQ ED NO:l, nucleotides 1553-2068), the sequence of which was fully contained within the first 
clone. Of interest, however, were the final 21 codons of the open reading frame contained on clone #22 
which were not present in either of the other two cDNA clones (SEQ ID NO: 1 , nucleotides 2006-2101). 
Thus, it was unclear where this sequence fit in and/or whether it was merely a cloning artifact. 

EXAMPLE 5 
COMPARISON OF DIFFERENT cDNA CLONES 

Comparison of the Clontech library clone #3e with clone #11 revealed that the two sequences 
diverge at almost exactly the point at which the 5* open reading frame of clone #3e terminates. 
Examination of the DNA sequence surrounding this point reveals a potential mRNA splice site [Padgett 
et aL , Ann. Rev. Biochem. 55: 1119-1 150(1986)], and leaves open the possibility that clone #3e contains 
an unspliced intron which interrupts the open reading frame (ORF). Further support for this possibility 
comes from the identification of a tryptic sequence: 

SEQ ID NO:22 

LVTQEVSPK 

isolated from the 150,000 dalton protein fragment which matches the ORF sequence immediately 
preceding the IVGG N-terminal sequence of the mature catalytic chain in clone #11. This tryptic 
sequence is interrupted by divergent sequence in clone #3e. In addition, two other tryptic peptides: 

SEQ ID NO:23 

A-FTTGYGLGIPEP and 

SEQ ID NO:24 

LF-GTTDSSGLVQF 

isolated from the 150,000 dalton enterokinase protein band match two regions of the translated ORF of 
clone #11 upstream of the catalytic chain coding sequence. Therefore, this upstream ORF apparently 
represents the coding sequence for the non-catalytic (heavy) chain, which is believed to be generated 
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from a single proteolytic cleavage immediately prior to the mature catalytic chain N-terminal sequence 
to separate the mature catalytic chain from the non-catalytic chain. 

EXAMPLE 6 

ISOLATION OF ADDITIONAL 
ENTEROKINASE CODING SEQUENCE 

Nested oligonucleotide primers were synthesized which were complementary to the lambda DNA 

sequence adjacent to the cloning site for the cDNA insertions. These primers are shown as Lambda 

Primers below. In addition, primers were designed which are complementary to the plus strand of the 

most 5' region of the enterokinase coding sequence as described, supra. These primers are shown as 

EK Primers below. The innermost primers were designed to contain a 5' extension (shown in bold) to 

encode a restriction endonuclease cleavage site. 

Lambda Primers 

SEQ ID NO:25 

5' CTATAGACTGCTGGGTAGTCCCC 3* OUTER 

SEQ ID NO:26 

5' ATAAGAATGCGGCCGCAAGTTCAGCCTGGTTAAGTCCAAGC 3* INNER 
EK Primers 

SEQ ID NO:27 

5' CCAAATACAGAAAGCCTGATTAGGG V OUTER 

SEQ ID NO:28 

5' GTAGGTCGACCGTGAATGTTGTATTTGGCTCCC 3' INNER 
Nested PCR was then performed as follows: each 100 y\ reaction contained 1x10 s recombinant phage 
from the Clontech bovine small intestine lambda gtlO cDNA library, l^tmole of each outer primer, 200 
/xM dNTPs, and 1 unit of Amplitaq (Perkin-Eimer Cetus) in a final concentration of 10 mM Tris-HCl 
pH 8.3, 50 mM KCL, 1.5 mM MgCl 2 . Thirty-five cycles were performed under the following 
conditions: 94°C, 1 minute; 65°C, 2 minutes; 72°C, 2 minutes. Five microliters were removed from 
this reaction and used as template for another 35 cycles utilizing the inner primers and the same reaction 
conditions. The products from this reaction were then run on a 1% polyacryiamide gel, stained in a 
solution of 0.5 /xg/ml ethidium bromide and visualized under UV light. The resulting bands were 
excised, electroeluted, and digested with NotI and Sail prior to subcloning into a pBluescript SK+ 
(Stratagene) Notl/Sall vector. The resulting subclones were sequenced, and additional DNA sequence 
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as a hybridization probe to screen cDNA or genomic libraries of other species at reduced stringency 
allows isolation of the desired enterokinase gene. Alternatively, oligonucleotides which encompass the 
DNA sequence encoding the regions surrounding the "catalytic triad", Le. , His 41 , Asp 92 , and Ser 187 are 
likely to be most highly conserved and most useful for cross-species hybridization. 

EXAMPLE 8 

EXPRESSION OF THE GENE 
ENCODING THE CATALYTIC 
DOMAIN OF BOVINE ENTEROKINASE 

A. CHO Cell Expression 

1. PACE 

The DNA sequence encoding the catalytic domain (nucleotides 1691 to 2398) of bovine 
enterokdnase, was fused in-frame to the 3' end of the DNA encoding the signal peptide and pro- region 
of the human PACE gene. PACE is a mammalian serine protease which cleaves after dibasic residues, 
and is responsible for propeptide processing of a number of secreted proteins. (Wise et al., Proc. Natl. 
Acad. Sci. USA 57:9378-9382(1990). When expressed in CHO cells, this sequence was translated to 
produce a chimeric protein which was secreted with subsequent signal peptide processing to yield pro- 
enterokdnase. The PACE pro-peptide contains a sequence (-Arg-Thr-Lys-Arg-) at the C-terminal junction 
with mature enterokinase sequence; this is the cleavage site for the PACE enzyme. CHO cells also 
produce endogenous levels of PACE. During secretion of the PACE pro/enterokinase light chain, host 
PACE cleaved the pro-peptide from the N-terminus of the enterokinase, resulting in secretion of mature 
enterokinase catalytic domain to the conditioned media. Immunoprecipitation experiments using rabbit 
polyclonal antisera raised against bovine-derived enterokinase revealed a 42 Kd product was secreted into 
the conditioned media. 

This conditioned media contained cleaving activity toward the fluorogenic enterokinase substrate 
Gly-(Asp 4 )-Lys-0NA (Bachem Bioscience)(corresponding to approximately 50-500 ng/ml depending on 
the cell line). This activity was inhibited by the addition of either soybean trypsin inhibitor (STI, Sigma) 
or bovine pancreatic trypsin inhibitor (BPTI, Sigma). It has been reported that the bovine holoenzyme, 
i.e., having both heavy and light chains, is inhibited by only BPTI and not STI, while the partially 
reduced and alkylated light chain is inhibited by both [Light et al. 9 J. Biol. Chem. 259: 13 195- 
13198(1984)]. In addition, incubation of this conditioned media with a partially purified fusion protein 
of £. coli thioredoxin/human IL-1 1 which contains an interdomain spacer consisting of the enterokinase 
cleavage sequence (-Gly-Ser-Gly-Ser-Gly-[AspJ-Lys-Asn-) resulted in total and specific cleavage of this 
fusion protein into its two component domains (thioredoxin and EL-1 1), with cleavage occurring between 
the Lys and Asn residues in the spacer sequence. In addition, this CHO-produced recombinant 
enterokinase catalytic domain was capable of specifically cleaving other fusion proteins containing this 
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same spacer, for instance an E. coli thioredoxin/human MIP-la fusion and an E. coli thioredoxin/human 
MIF fusion, into their component parts. This cleavage was confirmed by SDS-PAGE analysis of the 
cleavage products. 

The relative molar activities were as follows: 

EFFICIENCY OF CLEAVAGE 

Substrate Holoenzyme CHO Produced 

Light Chain 

Gly-<Asp4)-Lys-)SNA 1 j 

Trypsinogen 100 1 

Trx/IL-11 1 25 

Quite surprisingly, the CHO-produced light chain is 25 times more effective than bovine-derived 
holoenzyme when used to cleave a thioredoxin/IL-1 1 fusion protein containing an enterokinase cleavage 
site between the two protein domains. This dramatic difference is duplicated on the other fusion proteins 
listed above. In addition, secondary proteolysis due to contaminating serine proteases (e.g. trypsin and 
chymotrypsin) which co-purify with bovine-derived holoenzyme is absent with the recombinant single 
chain form. As such, recombinant single chain enterokinase is a superior reagent for fusion protein 
cleavage. 

2. Modified PACE 

This expression system has been improved by co-expression of a modified version of the PACE 
gene which has had the transmembrane domain deleted. Rehemtulla et aL , Blood 79:2349 (1992). This 
overexpressed and secreted PACE efficiently processes the PACE pre/pro-enterokinase and allows 
greater processing capability for enterokinase overexpression, as endogenous PACE levels in CHO ceils 
are low and incapable of processing highly expressed proenterokinase. Thus, at high expression levels, 
endogenous PACE activity becomes limiting with some enterokinase remaining unprocessed and resulting 
in some inactive material. Increasing soluble PACE levels allow for the accumulation of high levels of 
properly processed, active enterokinase in the conditioned media. 

3. Trypsinogen 

Constructs were also prepared which fused the DNA encoding the pre/pro region of bovine 
anionic trypsinogen [Le Heurou et aL, Eur. J. Biochem. 193: 767-773(1990)] in-frame to the DNA 
sequence encoding the mature enterokinase catalytic domain. The pro region of trypsinogen contains 
an enterokinase cleavage site (Asp 4 -Lys) as it is the natural substrate of enterokinase, and this construct 
was designed to produce secreted enterokinase "zymogen" with the trypsinogen propeptide attached to 
its N-terminus which could then be activated by addition of enterokinase to initiate autocatalytic 
processing. Expression of this construct in CHO cells resulted in mostly intracellular accumulation; 
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however, the small amount of material secreted gave undetectable levels of activity in a fluorogenic 
enterokinase peptide assay. In addition, activity was not stimulated by the addition of enterokinase to 
the proprotein. It appears that this chimeric protein is not capable of forming an active species. It is 
speculated that the light chain benefits from translational fusion with a large protein domain (analogous 
5 to heavy chain) which is post-translationally removed to allow the active conformation of light chain to 
form. The PACE pro-peptide functioned effectively in this capacity. 

B. Expression In E. Coli 

In an effort to increase solubility and produce enterokinase with an authentic N- terminus, the 

10 coding sequence for the catalytic chain was fused in-frame to the 3' end of the £. coli thioredoxin gene 
rLunn et aL 9 J. Biol. Chem. 259:10469-10474(1984)] with a spacer which encodes an enterokinase 
cleavage site (-Gly-Ser-Gly-Ser-Gly-[Asp 4 ]-Lys). This construct is under the transcriptional control of 
the lambda pL promoter [Shimatake et ai , supra. ] on a multicopy plasmid, and directs the cytoplasmic 
expression of a thioredoxin/enterokinase catalytic domain fusion protein. A portion of the expressed 

15 fusion protein is soluble when expressed at 17°C, and full solubility can be achieved by lysing the cells 
in the presence of low levels of urea {e.g., 3 M). This fusion protein can be purified from cell ly sates 
and cleaved with enterokinase to generate active enterokinase. The intent of this construct is to allow 
autocatalytic processing of the fusion protein, Le. 9 cleavage is begun by a small amount of active 
enterokinase (either holoenzyme or catalytic chain), and as active catalytic chain is released from its 

20 fusion partner it can then continue to cleave remaining fusion protein in the reaction. At least partial 
purification of the fusion protein is necessary to eliminate inhibitor(s) of enterokinase present in E. coli 
cell ly sates. Active light chain, specifically inhibited by STI, is produced. 

Alternatively, other fusion partners may also be employed. For instance, the E. coli maltose- 
binding protein, a secreted protein which has been described as a competent fusion partner (Maina et ai , 
25 Gene 74:365 [1988]) has been used with success. We anticipate that other fusion strategies may also 
serve to allow proper folding and provide a means to produce authentic, active enterokinase light chain. 

C. Expression in Saccharomyces cerevisiae 
1. PACE 

30 The expression construct described for use in CHO cells utilizing the PACE pre/pro sequence 

fused to the 5' end of the coding sequence for mature bovine enterokinase catalytic chain can also be 
used for enterokinase secretion from Saccharomyces cerevisiae. This yeast has been shown to produce 
an enzyme called Kex2 [Julius et. ai, Cell 37:1075 (1984)] which cleaves on the C-terminal side of 
dibasic residues, similarly to PACE. Co-expression of the yeast kex2 gene with the PACE pre/pro- 

35 bovine enterokinase light chain construct in COS cells results in complete processing of the PACE 
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pre/pro sequence to yield a product which is immunoprecipitable with bovine enterokinase antisera and 
co-migrates with PACE-processed enterokinase light chain after separation of the products on SDS- 
PAGE. Thus, yeast Kex2 recognized and cleaved the PACE cleavage site in the PACE pro-sequence 
to produce mature enterokinase. 

The coding sequence for this chimeric construct (mammalian PACE secretory leader and 
propeptide sequence followed by the mature bovine enterokinase light chain sequence) was inserted into 
a yeast expression vector to produce and secrete the fusion protein. The host Kex2 protein is expected 
to cleave off the PACE pro-peptide following the Arg-Thr-Lys-Arg sequence, resulting in secretion of 
properly processed mature enterokinase light chain. The Kex2 protein may be co-expressed to increase 
processing activity if needed. Such over-expression of Kex2 may be accomplished with either the native 
protein or with a soluble derivative lacking the C-terminal transmembrane domain as described by 
Brenner et. al. , supra. . This form of Kex2 is analogous to the soluble PACE co-expressed with the 
PACE pre/pro-bovine enterokinase light chain used in the CHO cell expression. Alternatively, 
mammalian PACE can be co-expressed in yeast to accentuate pro-peptide processing of the chimeric 
enterokinase construct either in the presence or absence of host endogenous levels of Kex2. 

2. ot-Factor 

Alternatively, the coding sequence for mature bovine enterokinase light chain can be fused to 
the coding sequence for the secretory leader and pro-peptide of, for instance, the a-factor protein from 
S. cerevisiae, a protein which is normally secreted and subsequently processed by Kex2 [Julius et. al. , 
Cell J2:839 (1983)]. This construct is expected to produce material similar to the other construct 
described above; that is, properly processed and active enterokinase light chain which accumulates in 
the culture media in active form. 

25 EXAMPLE 9 

CO-EXPRESSION OF FUSION PROTEINS 
AND ACTIVE ENTEROKINASE 

A configuration advantageous in some situations co-expresses active enterokinase along with a 
fusion protein which is to be subsequently cleaved. The fusion can be segregated by cell 
compartmentalization during cell growth and fusion protein synthesis, thereby allowing the desirable 
effects of fusion proteins (e.g. , stabilization, solubility) to remain. Then, upon cell lysis, the active 
enterokinase is allowed to mix (co-mingle) with the expressed fusion protein and cleave it, thereby 
simplifying the downstream processing of the fusion protein. One method for accomplishing this is to 
secrete active enterokinase into the periplasmic space ofE. coli, while producing a fusion protein in the 
cytoplasm. Other methods can be equally suitable, for instance co-secretion of enterokinase and a fusion 
protein in CHO cells, analogous to the co-secretion of PACE and the PACE pro/enterokinase fusion 
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employed for CHO production of active enterokinase as described, supra. Another method is co- 
expression of an enterokinase fusion protein (e.g. , Trx/enterokinase light chain with an enterokinase 
cleavage site between them) and a fusion protein containing a desired protein product, also with an 
enterokinase site between the domains. The enterokinase is expected to remain inactive until purified 
and until concentrated to die point where autocatalysis occurs, whereupon the co-purified desired fusion 
protein will also be processed. 

EXAMPLE 10 

USE OF ENTEROKINASE AS 
A THERAPEUTIC AGENT 

A condition exists in humans whereby the ability to digest protein is severely impaired [Hadorn 

et al., Lancet 7:812-813(1969); Tariow et a/., Arch. Dis. Child. 45:651-655(1970)]. Studies on these 

patients have revealed that they are deficient in the production of enterokinase (enteropeptidase), which 

is necessary for the conversion of trypsinogen to trypsin which in turn activates the numerous pancreatic 

zymogens responsible for digestion to occur. Duodenal juice from these patients cannot activate 

trypsinogen in vitro , but addition of purified enterokinase to this duodenal juice results in activation of 

proteolytic enzymes, suggesting that the inactive zymogens are present and able to be activated [Hadorn 

et aL, supra.]. This condition has been treated in the past with pancreatic extracts. 

A recombinant enterokinase may be used as a therapy for this condition. When formulated to 
allow oral administration, the enzyme enters the duodenum where it encounters the inactive pancreatic 
zymogens entering from the pancreatic duct. There it activates trypsinogen which in turn activates the 
other zymogens, and proper digestion proceeds. The human form of enterokinase gene may also be 
useful in gene therapy to correct this condition. 

The foregoing illustrative examples relate to the isolation and characterization of nucleic acid 
sequences encoding enterokinase activity, as well as the corresponding transcription and translation 
thereof to yield the corresponding proteins and polypeptides. Also described are the uses of these 
proteins either as a heavy and light chain together, or a light chain alone. 

While the present invention has been described in terms of specific methods and compositions, 
it is understood that variations and modifications will occur to those skilled in the art upon consideration 
of the present invention. 

Numerous modifications and variations in the invention as described in the above illustrative 
examples are expected to occur to those skilled in the art and consequently only such limitations as 
appear in the appended claims be placed thereon. Accordingly, it is intended in the appended claims to 
cover all such equivalent variations which come within the scope of die invention as claimed. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 
(i) APPLICANT.- 

(C) CITY: Ca«brf££ g ^ ° rive 

W TELEFAX, ( A & y] 'WIS} 70 

TITLE ° P ™™"°N: CLONING OF ENTEROKINASE 
(iii) NUMBER OF SEQUENCES: 33 "TEROKINASE AND METHOD OF USE 

(iv) COMPUTER READABLE FORM* 

(A) MEDIUM TYPpT -7 • 

(B) COMPOTE^'/^PPy dio * 

(C) OPERATING "stJm- ~" P * tible 

<°> software.. Si n P ^\ D °f/f ; ? os 

' Verai< "> «.2S (EPO) 

(2) INFORMATION FOR SEQ ID NO: L 

JB) TYPE: nuf"L acL PairS 
(ii) MOLECULE TYPE: cDNA 

(Xi) SEQUENCE DESCRIPTION: SEQ XD NO: X • 
CGGAGCTTGT GATGGAAGAT TTTTGTTGAC TGGATCTTCT CGC. 

TTATCCCAAG CCTTCTAATA ATACAAGCGC TCTTTG^ AGGCTCTGCA 
AGCACTTTCC ATTCAACTGA ACTTCGATTA TTTTAATACA T^ 

TATTTATGAA GGAATGGGTT CAAGCAAGAT Z ""^ 

TGGCATAATT AGGATTTTTT CCAATC TCTCTCTGGT CAAATAATCC 

-~ — — — — 

»—» ATTTTGAAGA 

^TOACAAT GGACTCAGOG AAGC ° ""~ »» 

— CCAC AC^ AroM ^ — CGGACCAA, 

—A GAAAGAGTAO GAC^AAC 

^» ™— • 

~« CAAAACATGG AGAAGACAAT „ ' OC " 1 " 1 " 

™*«=TAT GGACAAGTAA CATTAAATGA J. «"°™«T ATGGACAAAA 

•»—- cag,^. gJ"«; rr m ™— -»■—. 

""GGATGAC ATTAGCCTAA CATATGGGAT 



60 
120 
ISO 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
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TTGTAATATG 


AGTGTCTATC 


CAGAACCAAC 


TTTAGTCCCA 


ACTCCTCCAC 


CAGAACTTCC 


900 


CACGGACTGT 


GGAGGGCCTC 


ATGACCTGTG 


GGAGCCAAAT 


ACAACATTCA 


CGTCTATAAA 


960 


CTTCCCAAAC 


AGCTACCCTA 


ATCAGGCTTT 


CTGTATTTGG 


AATTTAAATG 


CACAAAAGGG 


1020 


AAAAAATATT 


CAGCTCCACT 


TTCAAGAATT 


TGACCTGGAA 


AATATTGCAG 


ATGTAGTTGA 


1080 


AATCAGAGAT 


GGTGAAGGAG 


ATG ATT CCTT 


GTTCTTAGCT 


GTGTACACAG 


GCCCTGGTCC 


1140 


AGTAAACGAT 


GTGTTCTCAA 


CCACCAACCG 


AATGACTGTG 


CTTTTTATCA 


CTGATAATAT 


1200 


GCTGGCAAAA 


CAGGGATTTA 


AAGCAAATTT 


CACTACTGGC 


TATGGCTTGG 


GGATTCCAGA 


1260 


ACCCTGCAAG 


GAAGACAATT 


TTCAGTGCAA 


GGATGGGGAG 


TGTATTCCGC 


TGGTGAATCT 


1320 


CTGTGACGGT 


TTTCCACACT 


GTAAGGATGG 


CTCAGATGAA 


GCACACTGTG 


TGCGTCTCTT 


1380 


CAATGGCACG 


ACAGACAGCA 


GTGGTTTGGT 


GCAGTTCAGG 


ATCCAAAGCA 


TATGGCATGT 


1440 


AGCCTGTGCC 


GAGAACTGGA 


CAACCCAGAT 


CTCAGATGAT 


GTGTGTCAGC 


TGCTGGGACT 


1500 


AGGGACTGGA 


AACTCATCCG 


TGCCAACCTT 


TTCTACTGGA 


GGTGGACCAT 


ATGTAAATTT 


1560 


AAACACAGCA 


CCTAATGGCA 


GCTTAATACT 


AACGCCAAGC 


CAACAGTGCT 


TAGAGGATTC 


1620 


ACTGATTTTG 


CTACAATGTA 


ACTACAAATC 


ATGTGGGAAA 


AAACTGGTGA 


CTCAAGAAGT 


1680 


TAGCCCGAAG 


ATTGTCGGAG 


GAAGTGACTC 


CAGAGAAGGA 


GCCTGGCCTT 


GGGTCGTTGC 


1740 


TCTGTATTTC 


GACGATCAAC 


AGGTCTGCGG 


AGCTTCTCTG 


GTGAGCAGGG 


ATTGGCTGGT 


1800 


GTCGGCCGCC 


CACTGCGTGT 


ACGGGAGAAA 


TATGGAGCCG 


TCTAAGTGGA 


AAGCAGTGCT 


1860 


AGGCCTGCAT 


ATGGCATCAA 


ATCTGACTTC 


TCCTCAGATA 


GAAACTAGGT 


TGATTGACCA 


1920 


AATTGTCATA 


AACCCACACT 


ACAATAAACG 


GAGAAAGAAC 


AATGACATTG 


CCATGATGCA 


1980 


TCTTGAAATG 


AAAGTGAACT 


ACACAGATTA 


TATACAGCCT 


ATTTGTTTAC 


CAGAAGAAAA 


2040 


TCAAGTTTTT 


CCCCGAGGAA 


GAATTTGTTC 


TATTGCTGGC 


TGGGGGGCAC 


TTATATATCA 


2100 


AGGTTCTACT 


GCAGACGTAC 


TGCAAGAAGC 


TGACGTTCCC 


CTTCTATCAA 


ATGAGAAATG 


2160 


TCAACAACAG 


ATGCCAGAAT 


ATAACATTAC 


GGAAAATATG 


GTGTGTGCAG 


GCTATGAAGC 


2220 


AGGAGGGGTA 


GATTCTTGTC 


AGGGGGATTC 


AGGCGGACCA 


CTCATGTGCC 


AAGAAAACAA 


2260 


CAGATGGCTC 


CTGGCTGGCG 


TGACGTCATT 


TGGATATCAA 


TGTGCACTGC 


CTAATCGCCC 


2340 


AGGGGTGTAT 


GCCCGGGTCC 


CAAGGTTCAC 


AGAGTGGATA 


^^m^mm^W^ ^ A A 




*> Ann 


TGTTTCCAGA 


AACAAAGATG 


AAAATCAGGC 


AGTTTTCCCA 


TTTCACTTTA 


AGAAGCATGG 


2460 


AAATTGAGAG 


TTAAAAAAAT 


AATAATTTAT 


AAAAG TCTTG 


ATTCTTACCT 


AAGGCACTGA 


2520 


AATGCTACAA 


AAAAAAAAAA 


ACCGGAATTC 


AGCTTGGACT 


TAACCAGGCT 


GAACTTGCGG 


2580 


C 












2581 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 798 amino acids 
(8) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
Jly Ala Cy8 Asp Gly Arg Phe Leu Leu 

5 10 6r C1 y Ser Phe 

Glu Ala Leu Hia Tyr Pro l„= » 15 

Lye Pro Ser Aen Aen Thr ser Ala Val Cy8 

40 Iie Gln Leu Asn p ne 

Asp Tyr Phe Aan Thr iy r ^ s 5 

50 y lyr Ala Asp Val Leu -r, 

55 F aA teu Jan lie Tyr Glu Gly 

- «, - ^ ^ „. Leu ^ u> set ^ ^ _ ^ ^ 

«* xu xx. ^ ». s „ _ „. ^ Ma ^ ^ « 

«. ~ «p «. s . r „ tyr „. a „ Ly . Vll IhI ^ 11 Ala 

- - gj „. _ Aan J.„ ^ „. Ly . „. _ ^ _ ^ 

«- s «, ». c. ». a „. olo Aep _ 

m 140 

a «. ~ 01 „ ffi s „ Ihr Ph . Pto a ser Tht ciy ^ ^ 

- - - - a «, - «. s „ , h . ^ tl . _ Tht - 

•* "° S - - - S ~ «, «« x« Ihr I P „ 

- MP ;» Ihr ,„ 01 „ oln cy . ^ u sn phe »° ^ ^ 

- a - - v.x J,. _ s „ IX . - s>r ^ wn 

- - «. ^ „. Ph . ei „ ly . 01u a ien iyr ^ oin ^ 

- - ^ c ly jj. „ Ihr „, A .„ ^ ^ vw ^ - 

- - * Jg ty . s .„ oln Ja len ^ Mp tl> Ali ». 

- .x. j« x- Ihr ^ oly „. cy . ^ ^ - ^ 
p~ gj x. u „ ,„ thr ^ p „ pro Hu ^ ^ « ^ ^ ^ 
a - ax. Mp ^ olu p „ A>n ^ ^ ^ ^ ^ ^ ^ 
«- - - s„ jg P „ A .„ G1 „ A1 . ^ ^ ^ - 
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Ala Gin Lye Gly Lye Asn lie Gin Leu Hie Phe Gin Glu Phe Asp Leu 
340 345 350 

Glu Asn lie Ala Asp Val Val Glu lie Arg Asp Gly Glu Gly Asp Asp 
355 360 365 

Ser Leu Phe Leu Ala Val Tyr Thr Gly Pro Gly Pro Val Asn Asp Val 
370 375 380 

Phe Ser Thr Thr Asn Arg Met Thr Val Leu Phe lie Thr Asp Asn Met 
385 390 395 400 

Leu Ala Lye Gin Gly Phe Lya Ala Asn Phe Thr Thr Gly Tyr Gly Leu 
405 410 415 

Gly lie Pro Glu Pro Cys Lys Glu Asp Asn Phe Gin Cys Lys Asp Gly 
420 425 430 

Glu Cys lie Pro Leu Val Asn Leu Cys Asp Gly Phe Pro His Cys Lys 
435 440 445 

Asp Gly Ser Asp Gla Ala His Cys Val Arg Leu Phe Asn Gly Thr Thr 
450 455 460 

Asp Ser Ser Gly Leu Val Gin Phe Arg lie Gin Ser He Trp His Val 
465 470 475 480 

Ala Cys Ala Glu Asn Trp Thr Thr Gin He Ser Asp Asp Val Cys Gin 
485 490 495 

Leu Leu Gly Leu Gly Thr Gly Asn Ser Ser Val Pro Thr Phe Ser Thr 
500 505 510 

Gly Gly Gly Pro Tyr Val Asn Leu Asn Thr Ala Pro Asn Gly Ser Leu 
515 520 525 

He Leu Thr Pro Ser Gin Gin Cys Leu Glu Asp Ser Leu He Leu Leu 
530 S35 540 

Gin Cye Asn Tyr Lys Ser Cyo Gly Lys Lys Leu Val Thr Gin Glu Val 
545 550 555 560 

Ser Pro Lys He Val Gly Gly Ser Asp Ser Arg Glu Gly Ala Trp Pro 
565 570 575 

Trp Val Val Ala Leu Tyr Phe Asp Asp Gin Gin Val Cys Gly Ala Ser 
580 585 590 

Leu Val Ser Arg Asp Trp Leu Val Ser Ala Ala Hie Cys Val Tyr Gly 
595 600 605 

Arg Asn Met Glu Pro Ser Lys Trp Lys Ala Val Leu Gly Leu His Met 
610 615 620 

Ala Ser Asn Leu Thr Ser Pro Gin He Glu Thr Arg Leu He Asp Gin 
625 630 635 640 

He Val He Asn Pro His Tyr Asn Lys Arg Arg Lye Asn Asn Asp He 
645 650 655 

Ala Met Met His Leu Glu Met Lys Val Asn Tyr Thr Asp Tyr He Gin 
660 665 670 

Pro He Cys Leu Pro Glu Glu Asn Gin Val Phe Pro Pro Gly Arg He 
675 680 685 
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Cys Ser lie Ala Gly Trp Gly Ala Leu He Tyr Gin Gly Ser Thr Ala 
690 695 700 

Asp Val Leu Gin Glu Ala Asp Val Pro Leu Leu Ser Asn Glu Lye Cys 
705 710 715 720 

Gin Gin Gin Met Pro Glu Tyr Asn He Thr Glu Asn Met Val Cys Ala 
725 730 735 

Gly Tyr Glu Ala Gly Gly Val Asp Ser Cye Gin Gly Asp Ser Gly Gly 
740 745 750 * 

Pro Leu Met Cys Gin Glu Asn Asn Arg Trp Leu Leu Ala Gly Val Thr 
755 760 765 

Ser Phe Gly Tyr Gin Cys Ala Leu Pro Asn Arg Pro Gly Val Tyr Ala 
770 775 7Q0 

Arg Val Pro Arg Phe Thr Glu Trp He Gin Ser Phe Leu His 
785 790 795 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

lie Val Gly Gly Ser Asp Ser Tyr Glu Gly Ala Trp Pro Trp Val Val 
1 5 10 15 

Ala Leu Tyr Phe Asp Asp Gin Gin Val Cys Gly 
20 25 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
CTCGAATTCA TGTGGGGTCG A 
(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 
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(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 5: 
CTCGAATTCA TGTGGGGAGG A 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDED NESS t single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Asp Gin Gin Val Cys Gly 
1 5 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS s single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: CDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
TCCAAGCTTC CCAACTGTGT C 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Tyr Glu Gly Ala Trp Pro 
1 5 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 
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<*i) SEQUENCE DESCRIPTION: SEQ i D NO: „ 
TAYGARGGNG CNTCCCC 

(2) INFORMATION FOR SEQ 10 NO: 10: 17 
U> SEQUENCE CHARACTERISTICS • 

)°> 1 *^e: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(*i> SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

f he AB P *«P Gin Gin Val 
5 

(2) INFORMATION FOR SEQ, ID NO: U . 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: cDNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO* X1 . 
TCCAAGCTTA CTGTGTCTCA A 

(2) INFORMATION FOR SEQ ID NO: l 2: 21 
(i) SEQUENCE CHARACTERISTICS • 

£ Se™l 30 amino ««■ 

amino acid 
(C STRANDEDNESS: single 
(D) topology: linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: „ . 

He Val Gly Gly f er Asp ser Arg Glu Gly Ala , 

5 9 Glu Gly Ala Trp Pro Trp Val Val 

(2) INFORMATION FOR SEQ ID NO: 13: 
(i) SEQUENCE CHARACTERISTICS. 

b SE*' 24 afflino ~ ida 

\°[ "PE: ammo acid 
/ ( n STRANDEDNESS : single 
(D) TOPOLOGY: linear 9 

(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Glu Gly Ala Trp Pro Trp Val Val Ala Leu Tyr Phe Asp Asp Gin Gin 

15 10 is 

Val Cys Gly Ala Ser Leu Val Ser 
20 

(2) INFORMATION FOR SEQ ID NO: 14: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Asp Trp Leu Val Ser Ala Ala His Cys Val Tyr Gly Arg 
15 10 

(2) INFORMATION FOR SEQ ID NO: 15: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Phe Thr Glu Trp He Gin Ser Phe Leu His 
15 10 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

He Cys Ser He Ala Gly Trp Gly Ala Leu He Tyr Gin Gly Ser Thr 
15 10 15 

Ala Asp Val Leu Gin Glu Ala 
20 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 
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(B) type: amino .ru 

(C) STRANDEDNESS * a? , 

(ii) MOLECULE TYPE: protein 

<Xi) SEQUENCE OESCRXPTXON: SEQ I0 NO : 17 . 
" PLBU T „r serp he 

Ae„ Aa o l5 X Tyr Gln C ^ Ala Leu pro 

Aan Aen Pro oiy VaX Tyr ^ 15 

<2) INTORMA « 0N POR SEQ XO no: i 8 . 

(B) TYPE- am f M ' ino acie >e 
I_ amino acid 

'«■«<""» TYPE., p^,^ 

SE00ENCT °*™0», SEQ „ „ 0! „. 
f " *** «» S.r Ly. 

(B) Typp. ? baae Paire 

(D) topology: linear 9le 
(U) M °^CULE TYPE: cDNA 

««i> SEQUENCE DESCRXPTION: SEQ ID NO- „ 
COCGGATCCC CRTANACRCA RTGNGC 
{2) 1NFOW *"ON POR SEQ XD NO: 2C 

w «PEf H nui 8 eir: c fd ir8 

fC) STRANDEDNESS. ff , 
(») TOPOLoS?f E ff ne ^"9le 

(ii) MOLECULE TYPE: CDNA 

<*i> SEQUENCE DESCRXPTXON: SEQ XD NO- 20 
CCGGAATTCT TGGGTCGTTG CTCTGTAT 
<2) INFOW ««ON FOR SEQ XD NO: 21i 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pair 8 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
CGCGGATCCA TACAGAGCAA CGACCCAA 
(2) INFORMATION POR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

Leu Val Thr Gin Glu Val Ser Pro Lys 
1 5 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 23: 

Ala Phe Thr Thr Gly Tyr Gly Leu Gly lie Pro Glu Pro 
15 10 

(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE : amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

Leu Phe Gly Thr Thr Asp Ser Ser Gly Leu Val Gin Phe 
15 10 
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<2) INFORMATION FOR SEQ ID NO: 25; 
(i) MQOWCS CHARACTERISTICS- 

(C) STRANDEDNESsf «f« i 

(D) TOPOLOGY^ ff ne ^ ie 

(ii) MOLECULE TYPE.- cDNA 

(«D SEQUENCE 0ESCRIP T10NJ SEQ ID NO: „. 
CTATAGACTG CTCGCTAGTC CCC 

<2> IW °^ION FOR SEQ ID NO: 26j 23 

(i> SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 41 

(B) Wpe: nucJeJc Lfd" 8 

(C) STRANDEDNeI" st™,o 
(») TOPOLOGY: linear 

(ii > MOLECULE TYPE: cDNA 

(Xi) SEQUENCE DESCRIPTION: SEQ I D NO: „. 

nr -°~ «— -~ ; 

(2) INP0 ^XOH FOR SEQ I D NO: „. 41 

(B) TYP^ H nu ^ S e ^- c Paira 

(C) STRANDEDNESS- «f» , 

(D) TOPOLOGY ^1 inear 

(ii) MOLECULE TYPE: cDNA 

(Xi) SEQUENCE DESCRIPTION SEQ „ ^ 
CCAAATACAG AAAGCCTGAT TAGGG 

<2) INFORMATION FOR SEQ I D NO : 2fl : 25 

(i) SEQUENCE CHARACTERISTICS • 
(A) LENGTH- w 

<»>' fisssrsLsS- 

(ii) MOLECULE TYPE: cONA 

(Xi) SEQUENCE DESCRIPTION: SEQ X0 NO: 
GTAGGTCGAC CGTGAATGTT GTATTTGGCT CCC 
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(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

Leu Ser lie Asn lie Ser Ser Asp Gin Asn Met Glu Lys 
15 10 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



<xi) SEQUENCE DESCRIPTION t SEQ ID NO: 30: 

Val Ser Phe Tyr Gly Phe Lye 

1 5 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

Gin Lye Glu Gly Asn Tyr Cly Gin Asn Trp Asn Tyr Gly Gin Val Thr 
15 10 15 

Leu Asn Glu Thr 
20 

(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
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<*i> SEQUENCE DESCRIPTJON: SEQ ID NO : 32 : 
Val Gly Leu Leu Thr Leu Pro 



5 



(2) INFORMATION FOR SEQ id NO: 33: 
U> SEQUENCE CHARACTERISTICS. 

"**E: amino acid 
(C) STRANDEDNESS • «i„ n i. 
CD) TOPOLOGyrffAear 916 

Cii) MOLECULE TYPE: protein 

(xi, SEQUENCE DESCRIPTION: SEQ „ NO : 33- 
Thr lie Phe ci n Lye 
X 5 
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WHAT IS CLAIMED IS: 

1. A nucleic acid sequence comprising an isolated nucleic acid sequence encoding enterokinase 
activity. 

2. The nucleic acid sequence of claim 1 encoding mammalian enterokinase. 

3. The nucleic acid sequence of claim 1 encoding bovine enterokinase. 

4. The nucleic acid sequence of claim 1 encoding human enterokinase. 

5. The nucleic acid sequence of claim 1 selected from the group consisting of genomic DNA, 
complementary DNA, and synthetic DNA. 

6. The nucleic acid sequence of claim 1 and comprising a sequence substantially as set forth in SEQ 
ID NO:l. 

7. A nucleic acid sequence comprising a nucleic acid sequence encoding enterokinase activity and 
comprising a sequence substantially duplicative of the sequence as set forth in SEQ ID NO: 1 from 
nucleotide 1691 to nucleotide 2398. 

8. A nucleic acid sequence comprising a nucleic acid sequence comprising SEQ ID NO:l. 

9. A nucleic acid sequence comprising a nucleic acid sequence comprising nucleotides 1691 to 2398 
of SEQ ID NO:l. 

10. A nucleic acid sequence encoding enterokinase activity said nucleic acid sequence being selected 
from the group consisting of: 

(a) a nucleic acid sequence substantially as set forth in SEQ ED NO: I, 

(b) a nucleic acid sequence which hybridizes to (a) under stringent conditions, and 

(c) a nucleic acid sequence which, but for the redundancy of the genetic code, would 
hybridize to (a). 

11. The nucleic acid sequence of claim 1, further comprising a second nucleic acid sequence 
encoding a member selected from the group consisting of a pre-region, pro-region, and a pre/pro region. 



WO 94/16083 



PCT/US94/00616 



40 

12. The nucleic acid sequence of claim 11, wherein said second nucleic acid sequence is a member 
selected from the group consisting of the pre/pro region of PACE, the pre/pro region of trypsinogen, 
and the pre/pro region of yeast a-factor. 

13. The nucleic acid sequence of claim 1, further comprising a second nucleic acid sequence 
encoding a thioredoxin-like molecule. 

14. A host cell transformed or transfected with a nucleic acid sequence of claim 1. 

15. A nucleic acid sequence vector comprising a nucleic acid sequence according to claim 1. 

16. The vector of claim 15 further comprising an expression control sequence operatively associated 
with said nucleic acid sequence. 

1 7. The vector of claim 15 corresponding to plasmid pEK-2/GI734 designated as ATCC Deposit No. 
69232. 

18. An expression product of the nucleic acid sequence of claim 1. 

19. A method for producing enterokinase activity comprising: 

(a) growing, in culture, the host cell of claim 13, and 

(b) isolating from said host cell or said culture the polypeptide product of the expression 
of said nucleic acid sequence. 

20. A method for the production of a protein having enterokinase activity comprising: 

(a) disposing a nucleic acid sequence according to claim 1 in a cell free transcription and 
translation system, and 

(b) isolating from said system the polypeptide product of the expression of said nucleic acid 
sequence. 

2 1 . The polypeptide product of the in vitro or in vivo expression of a nucleic acid sequence of claim 
1. 



22. A synthetic peptide duplicative of a sequence of amino acids present in enterokinase and unique 
to enterokinase. 
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23. An antibody specifically immunoreactive with a polypeptide according to claim 21. 

24. A method for treating disorders associated with low levels of enterokinase activity comprising 
the step of administering a polypeptide according to claim 21. 

5 

25. A method for treating disorders associated with low levels of enterokinase activity comprising 
the step of administering a nucleic acid sequence of claim 1 . 

26. A pharmaceutical composition for use in treating disorders associated with low levels of 
10 enterokinase activity comprising a therapeutically effective amount of enterokinase activity according to 

claim 21 in a pharmaceutical^ acceptable vehicle. 

27. A method for cleaving proteins comprising the step of cleaving said protein with the expression 
product of claim 1. 

15 

28. A method for producing proteins from fusion proteins comprising the steps of: 

(a) growing, in culture, a host cell transformed or transfected with 

(i) a nucleic acid sequence which encodes enterokinase activity and which upon 
expression is segregated into the periplasmic space, and 
20 (ii) one or more nucleic acid sequences which encode a fusion protein and an 

enterokinase cleavage site and which, upon expression are segregated to the cytoplasmic space, 

(b) allowing said periplasmic space and said cytoplasmic space to co-mingle thereby, 

(c) allowing said enterokinase activity to cleave said fusion protein, and 

(d) resulting in protein production. 



BNBDOCtD: <WO_ft41 6063A1 JU> 



WO 94/16083 PCT/US94/00616 

1/4 

FIGURE 1 



1 


CGGAGCTTGTGATGGAAGATTTTTGTTGACTGGATCTTCTGGGTCCTTTG 


50 


51 


» • • • • 

AGGCTCTGCATTATCCCAAGCCTTCTAATAATACAAGCGCTGTTTGTCGG 


100 


101 


• • • • • 

TGGATTATACGTGTAAACCAAGGACTTTCCATTCAACTGAACTTCGATTA 


150 


151 


* • • • • 

TTTTAATACATATTATGCAGATGTATTAAATATTTATGAAGGAATGGGTT 


200 


201 


• • • • • 

CAAGCAAGATTTTAAGAGCTTCTCTCTGGTCAAATAATCCTGGCATAATT 


250 


251 


• • • • • 

AGGATTTTTTCCAATCAAGTTACTGCCACTTTTCTTATACAGTCTGATGA 


300 


301 


• • • • • 

AAGTGATTATATTGGCTTCAAAGTAACATACACTGCATTTAACAGCAAAG 


350 


351 


• • • • • 

AGCTTAATAATTATGAGAAAATCAACTGTAATTTTGAAGATGGCTTCTGT 


400 


401 


. • • • • 

TTCTGGATCCAGGATCTAAATGATGACAATGAGTGGGAAAGGACTCAGGG 


450 


451 


* • • • • 

AAGCACCTTTCCTCCATCTACTGGACCAACTTTTGACCACACTTTTGGCA 


500 


501 


. • • • • 

ATGAGTCAGGATTTTACATTTCCACCCCAACTGGACCAGGAGGAAGACGA 


550 


551 


. • • • • 

GAAAGAGTAGGACTTTTAACTCTCCCTTTAGATCCCACTCCTGAACAAGC 


600 


601 


. • • • • 

CTGCCTTAGTTTCTGGTATTATATGTATGGTGAAAATGTTTACAAACTAA 


650 


651 


GCATTAATATCAGCAGTGACCAAAACATGGAGAAGACAATTTTCCAAAAA 


700 


701 


, • • • • 
GAAGGAAATTATGGACAAAATTGGAACTATGGACAAGTAACATTAAATGA 


750 


751 


. • • • • 
AACAGT GG AATTTAAGGTTTCTTTCTATGGGTTTAAAAACC AG ATCCTGA 


800 


oni 

O M X. 


GTGATATAGCATTGGATGACATTAGCCTAACATATGGGATTTGTAATATG 


850 


851 


• . • • • 

AGTGTCTATCCAGAACCAACTTTAGTCCCAACTCCTCCACCAGAACTTCC 


900 


901 


CACGGACTGTGGAGGGCCTCATGACCTGTGGGAGCCAAATACAACATTCA 


950 


951 


• • • • 
CGTCTATAAACTTCCCAAACAGCTACCCTAATCAGGCTTTCTGTATTTGG 


1000 


1001 


• 

AATTTAAATGCACAAAAGGGAAAAAATATTCAGCTCCACTTTCAAGAATT 


1050 


1051 


TGACCTGGAAAATATTGCAGATGTAGTTGAAATCAGAGATGGTGAAGGAG 


1100 
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1X01 ATGATTCCTTGTTCTTAGCTGTGTACACAGGCCCTGGTCCAGTAAACGAT 115 0 

• • • • * 

1151 GTGTTCTCAACCACCAACCGAATGACTGTGCTTTTTATCACTGATAATAT 12 0 0 

• • • • • 

12 01 GCTGGCAAAACAGGGATTTAAAGCAAATTTCACTACTGGCTATGGCTTGG 12 50 

• • • • « 

12 51 GGATTCCAGAACCCTGCAAGGAAGACAATTTTCAGTGCAAGGATGGGGAG 13 00 

• • • • • 

1301 TGTATTCCGCTGGTG AATCTCTGTGACGGTTTTCCACACTGTAAGGATGG 13 50 

, • • • • • 

1351 CTCAGATGAAGCACACTGTGTGCGTCTCTTCAATGGCACGACAGACAGCA 14 00 

• • • • • 

14 01 GTGGTTTGGTGC AGTTC AGGATCCAAAGCAT ATGGC ATGTAGCCTGTGCC 14 50 

• • • • * 

1451 G AG AACTGGAC AACCC AG ATCTCAG ATGATGTGTGTCAGCTGCTGGGACT 15 0 0 

• • • • • 

1501 AGGGACTGGAAACTCATCCGTGCCAACCTTTTCTACTGGAGGTGGACCAT 1550 

• • • • • 

1551 ATGTAAATTTAAACACAGCACCTAATGGCAGCTTAATACTAACGCCAAGC 1600 

• • • • • 

1601 CAAC AGTGCTT AGAGG ATTCACTGATTTTGCT ACAATGTAACTACAAATC 1650 

• • • • • 

1651 ATGTGGGAAAAAACTGGTGACTCAAG AAGTTAGCCCGAAGATTGTCGGAG 1700 

• • • • • 

1701 GAAGTGACTCCAGAGAAGGAGCCTGGCCTTGGGTCGTTGCTCTGTATTTC 1750 

• • • • • 

1751 GACGATCAACAGGTCTGCGGAGCTTCTCTGGTGAGCAGGGATTGGCTGGT 1800 

• • • • • 

1801 GTCGGCCGCCC ACTGCGTGT ACGGG AGAAAT ATGG AGCCGTCT AAGTGGA 1850 

• • • • • 

1851 AAGCAGTGCTAGGCCTGCATATGGCATCAAATCTGACTTCTCCTCAGATA 19 00 

• « • • • 

19 01 GAAACTAGGTTG ATTGACC AAATTGTCAT AAACCCACACTACAATAAACG 1950 

• ■ • • • 

1951 GAGAAAGAACAATGACATTGCCATGATGCATCTTGAAATGAAAGTGAACT 2 000 

• • • • • 

2 001 ACAC AGATTATATACAGCCT ATTTGTTTACCAGAAGAAAATCAAGTTTTT 2 050 

• • • • • 

2 051 CCCCCAGGAAGAATTTGTTCTATTGCTGGCTGGGGGGCACTTATATATCA 2100 

• • • • • 

2101 AGGTTCTACTGC AG ACGT ACTGC AAG AAGCTGACGTTCCCCTTCTATC AA 2150 

• • • • • 

2151 ATG AG AAATGTC AAC AAC AG ATGCC AG AAT AT AAC ATT ACGG AAAAT ATG 22 0 0 

• • • • * 

2201 GTGTGTGCAGGCT ATGAAGCAGG AGGGGT AGATTCTTGTCAGGGGGATTC 2250 
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2251 AGGCGGACCACTC ATGTGCC AAG AAAACAACAGATGGCTCCTGGCTGGCG 2300 

2 3 01 TGACGTCATTTGGATATC AATGTGCACTGCCTAATCGCCCAGGGGTGTAT 2 3 50 

» • • • • 

2351 GCCCGGGTCCCAAGGTTCACAGAGTGGATACAAAGTTTTCTACATTAGAG 24 00 

• • • • • 

2 401 TGTTTCCAG AAACAAAG ATGAAAATC AGGCAGTTTTCCCATTTCACTTTA 2450 

• • • • • 

2451 AGAAGCATGGAAATTGAGAGTTAAAAAAATAATAATTTATAAAAGTCTTG 2500 

• • • • • 

2501 ATTCTTACCTAAGGC ACTG AAATGCT ACAAAAAAAAAAAAACCGGAATTC 2550 

• • • 

2 551 AGCTTGGACTTAACC AGGCTG AACTTGCGGC 2581 
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FIGURE 2 

10 30 50 

c^cdgrfi^tgssgsfeai^Vpkpsnnts 

70 90 110 

IYEGMGSSKIIJU^LWSNNPGIIRIFSNQVTATFLIQSDESDYIGFKVTCTAFNSKELNK 

130 150 170 

YEKINCKFEDGFCFWIQDLNDDNEWERTQGSTFPPSTGPTFDHTFGNESGFYISTPTGPG 

190 210 230 

GWIERVGIXTLPIJ>PTPEQA # CLSFWYYMYG^ 

250 270 290 

WNYGQVTLNETVEFKVSFYGFKNQILSDIALDDISL^ 

310 330 350 

tix:ggphdlw^nttftsinfpnsypnqafci 

370 390 410 

IRDGEGDDSLFIAVYTGPGPVNDV^ 

430 450 470 

PCKEDKFQCn3GECIPLVMLCDGFPHCRDG*SDEAHCVRLFNGTTD88GIiVQFRIQSIWHV 

490 510 530 

ACAENWTTQISDDVCQLIiGLGTGNSSVPTFSTGGGPYVKIiNTAPNGSLILTPSQQCLEDS 

550 570 590 

LILLQCKYK8CGKKLVTQEV8PKIVGGSD8REG 

610 630 650 

8AAHCVYGRNMEPSKWKAVLGLHMA8NL^ 

670 690 710 

LEHKVNYTDYZQPICI«PEENQVFPPGRICS*XAGWGALXYQG8TAOVIfQEADV?LL8KEKC 

730 750 770 

QQQMPEYNITQlHVCAGYEAGGVD8CQGDS*GGPIiMCQEMNRWXfIiAGVT8FGYQCALP 

790 

GVYARVPRFTEWIQS FLH * 
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